System and method for remote measurements of vital signs

ABSTRACT

A remote photoplethysmography (RPPG) system includes an input interface to receive a sequence of measurements of intensities of different regions of a skin of a person indicative of vital signs of the person; a solver to solve an optimization problem to determine frequency coefficients of photoplethysmographic waveforms corresponding to the measured intensities at the different regions, wherein the solver determines the frequency coefficients to reduce a distance between intensities of the skin reconstructed from the frequency coefficients and the corresponding measured intensities of the skin while enforcing joint sparsity on the frequency coefficients; and an estimator to estimate the vital signs of the person from the determined frequency coefficients of photoplethysmographic waveforms.

TECHNICAL FIELD

This invention relates generally to remotely monitoring vital signs of aperson and more particularly to remote photoplethysmographic (RPPG)measurements of the vital signs.

BACKGROUND

Vital signs of a person, for example the heart rate (HR), heart ratevariability (HRV), the respiration rate (RR), or the blood oxygensaturation, serve as indicators of the current state of a person and asa potential predictor of serious medical events. For this reason, vitalsigns are extensively monitored in inpatient and outpatient caresettings, at home, and in other health, leisure, and fitness settings.One way of measuring vital signs is plethysmography. Plethysmographygenerally refers to the measurement of volume changes of an organ or abody part and in particular to the detection of volume changes due to acardiovascular pulse wave traveling through the body of a person withevery heartbeat.

Photoplethysmography (PPG) is an optical measurement technique thatevaluates a time-variant change of light reflectance or transmission ofan area or volume of interest, which can be used to detect blood volumechanges in the microvascular bed of tissue. PPG is based on theprinciple that blood absorbs and reflects light differently thansurrounding tissue, so variations in blood volume with every heartbeataffect transmission or reflectance correspondingly. PPG is often usednon-invasively to make measurements at the skin surface. The PPGwaveform includes a pulsatile physiological waveform attributed tocardiac synchronous changes in the blood volume with each heartbeat, andis superimposed on a slowly varying baseline with various lowerfrequency components attributed to respiration, sympathetic nervoussystem activity, and thermoregulation. Although the origins of thecomponents of the PPG signal are not fully understood, it is generallyaccepted that they can provide valuable information about thecardiovascular system.

Conventional pulse oximeters for measuring the heart rate and the(arterial) blood oxygen saturation of a person are attached to the skinof the person, for instance to a fingertip, earlobe or forehead.Therefore, they are referred to as ‘contact’ PPG devices. A typicalpulse oximeter can include a combination of a green LED, a blue LED, ared LED, and an infrared LED as light sources and one photodiode fordetecting light that has been transmitted through patient tissue.Commercially available pulse oximeters quickly switch betweenmeasurements at different wavelengths and thereby measure thetransmissivity of the same area or volume of tissue at differentwavelengths. This is referred to as time-division-multiplexing. Thetransmissivity over time at each wavelength gives the PPG waveforms fordifferent wavelengths. Although contact PPG is regarded as a basicallynon-invasive technique, contact PPG measurement is often experienced asbeing unpleasant, since the pulse oximeter is directly attached to theperson and any cables limit the freedom to move.

Recently, non-contact, remote PPG (RPPG) devices for unobtrusivemeasurements have been introduced. RPPG utilizes light sources or, ingeneral, radiation sources disposed remotely from the person ofinterest. Similarly, a detector, e.g., a camera or a photo detector, canbe disposed remotely from the person of interest. Therefore, remotephotoplethysmographic systems and devices are considered unobtrusive andwell suited for medical as well as non-medical everyday applications.

One of the advantages of camera-based vital signs monitoring versuson-body sensors is the high ease-of-use: there is no need to attach asensor; just aiming the camera at the person is sufficient. Anotheradvantage of camera-based vital signs monitoring over on-body sensors isthe potential for achieving motion robustness: cameras have greaterspatial resolution than contact sensors, which mostly include asingle-element detector.

One of the challenges for RPPG technology is to be able to provideaccurate measurements under motion/light distortions. Several methodshave been developed to enable robust camera-based vital signsmeasurement. For such measurements, usually a plurality of signals iscaptured based on image processing of a captured image sequence. Theplurality of signals may originate from different pixels of a sensorcorresponding to different regions of a skin of a person and/or fromdifferent color channels of one pixel corresponding to the same spatialposition. Then, photoplethysmographic waveforms are derived from theplurality of the signals. These photoplethysmographic waveforms areindicative of the vital signs of a person that can be determined byfurther analysis of the waveforms.

However, the quality of the photoplethysmographic waveforms is degradedto an extent determined by the value of the signal-to-noise ratio (SNR)of the sensed measurements. Low SNR due to light variations and falsepeaks in the photoplethysmographic waveforms due to motion have thepotential to confound the PPG signal. To address these challenges, somemethods perform an extensive analysis of a plurality ofphotoplethysmographic waveforms. For example, a method described in U.S.2016/0220128 estimates a weighted combination of the plurality of thephotoplethysmographic waveforms to reduce outliers caused by the noise.However, such an approach may not remove enough noise from the signaland may not always lead to an optimal result.

Accordingly, there is a need to reduce the sensitivity of the RPPGestimation to noise in the measurements of intensities (e.g., imagepixel intensities) of a skin of a person.

SUMMARY

Some embodiments are based on recognition that the sensitivity of remotephotoplethysmography (RPPG) to noise in the measurements of intensities(e.g., pixel intensities in camera images) of a skin of a person iscaused at least in part by independent derivation ofphotoplethysmographic waveforms from the intensities of a skin of aperson measured at different spatial positions. Some embodiments arebased on recognition that at different positions, e.g., at differentregions of the skin of the person, the measurement intensities can besubjected to different and sometimes even unrelated noise. To that end,the independent estimation of the photoplethysmographic waveforms fromintensities of different regions of the skin of a person may fail toassist each other in identifying such noise.

Some embodiments are based on recognition that the effect of the noiseon the quality of the RPPG estimation can be reduced by collectivelyestimating different photoplethysmographic waveforms of intensity of askin of a person measured at different regions of the skin. Indeed, whenRPPG is used to estimate a vital sign of a person, e.g., a heart rate,the heartbeat is a common source of intensity variations present in allregions of the skin. To that end, it can be beneficial to estimate thephotoplethysmographic waveforms of different regions collectively, i.e.,using a common metric.

Some embodiments are based on recognition that two types of noise areacting on the intensities of the skin, i.e., external noise and internalnoise. The external noise effects the intensity of the skin due toexternal factors such as lighting variations, motion of the person, andresolution of the sensor measuring the intensities. The internal noiseeffects the intensity of the skin due to internal factors such asdifferent effects of cardiovascular blood flow on the appearance ofdifferent regions of the skin of a person. For example, the heartbeatcan affect the intensity of the forehead and cheeks of a person morethan it affects the intensity of the nose.

Some embodiments are based on realization that both types of noise canbe addressed in the frequency domain of the intensity measurements.Specifically, the external noise is often non-periodic or has a periodicfrequency different than that of the signal of interest (e.g., thepulsatile signal), and thus can be detected in the frequency domain. Inaddition, the internal noise, while resulting in intensity magnitudevariations or time-shifts of the intensity variations in differentregions of the skin, preserves the periodicity of the common source ofintensity variations in the frequency domain. To that end, someembodiments are based on realization that the common metric on theestimate the photoplethysmographic waveforms needs to be enforced in thefrequency domain of the intensity measurements, rather than in thedomain of the intensity measurements themselves.

Some embodiments are based on recognition that if the frequencycoefficients of the photoplethysmographic waveforms are directly derivedfrom the intensity measurements, the enforcement of the common metric infrequency domain on such a direct estimation can be problematic.However, it can be advantageous to enforce the common metric during theestimation of the frequency coefficients rather than after the frequencycoefficients are estimated. To that end, some embodiments utilize anoptimization framework to reconstruct the frequency coefficients of thephotoplethysmographic waveforms to match the measured intensities,rather than to directly compute the frequency coefficients from themeasured intensities. Such a reverse direction in the estimation of thefrequency coefficients allows performing the reconstruction subject toconstraints that can enforce the common metric on the frequencycoefficients of different photoplethysmographic waveforms of differentregions. Because such a reconstruction reverses a direction of thedirect estimation of the frequency coefficients from the intensitymeasurements, such a reconstruction is referred herein as a reversereconstruction.

Some embodiments are based on realization that, in the frequency domain,the common metric enforced on different photoplethysmographic waveformscan be joint sparsity of the frequency coefficients of thephotoplethysmographic waveforms. The joint sparsity of the frequencycoefficients forces different photoplethysmographic waveforms to besparse together in the same frequency bins and/or to have large energyonly in the same frequency bins. Such a joint sparsity adequatelyreflects the notion of the common source of intensity variations, andcan jointly assist different photoplethysmographic waveforms to removedifferent and potentially unrelated outliers caused by the external andinternal noise, making the RPPG less sensitive to noise.

To that end, some embodiments determine the frequency coefficients ofphotoplethysmographic waveforms of intensity signals of differentregions of a person's skin in a way that minimizes the differencebetween the corresponding intensity signals estimated using thedetermined frequency coefficients and the measured intensity signals,while enforcing the joint sparsity on the determined frequencycoefficients. For example, some embodiments estimate the intensitysignals using an inverse Fourier transformation of the determinedfrequency coefficients. Such a reverse reconstruction allows reducingthe sensitivity of the RPPG estimation to the measurement noise.

Some embodiments are based on recognition that the determination of thefrequency coefficients can be represented as an optimization, e.g., aminimization, problem while the enforcement of the joint sparsity can berepresented as a constraint on the optimization. In such a manner, thecomputational requirement for finding the frequency coefficients can bereduced. Some embodiments are based on recognition that the constraintscan be enforced as a hard constraint prohibiting its violation or as asoft constraint penalizing its violation. Some embodiments enforce thehard constraint when the periodicity of the recovered vital sign isdesired. Otherwise, an embodiment enforces the soft constraint. Forexample, one embodiment enforces the joint sparsity as a soft constraintfor measuring the heart rate.

Some embodiments enforce the joint sparsity as a soft constraint byincluding in the optimization a weighted two-one norm of the frequencycoefficients of different photoplethysmographic waveforms. The two-onenorm component of the optimization promotes joint sparsity of frequencycoefficients, while the weight of the two-one norm component determinesthe penalty for violation of this soft constraint. Thus, the weight ofthe two-one norm component can be used to vary a number of non-zerofrequency coefficients, depending on the type of vital sign beingmeasured.

Some embodiments acquire the intensity measurements as continuoussignals over a significant time period, e.g., minutes, hours or evendays. To reduce the computational requirements, some embodimentsdetermine the vital signs sequentially for a sequence of temporalsegments of those continuous signals. Those segments can be overlappingor adjoining to each other. However, some embodiments are based onrecognition that the reverse reconstruction of the photoplethysmographicwaveforms from the frequency coefficients can introduce discontinuitiesin the estimation of vital signs across different segments of thecontinuous signals.

To address this discontinuity problem, one embodiment uses theoverlapping segments to determine the vital signs. At each control timestep, the current segment includes a first portion corresponding topreviously processed intensity measurements (from the previous segment,which was processed in the previous control time step) and a secondportion corresponding to newly measured intensities. For the firstportion of the segment, the embodiment uses the intensity signalreconstructed from the frequency coefficient determined for the firstportion during the previous control step. The reconstructed intensity isthen concatenated with the measured intensity of the second portionusing a weighted average to form the intensity measurements of thecurrent segment. Such a concatenation has an effect of smoothing thedifferences between processed and unprocessed signals, which reduces thediscontinuity of the estimated vital signs.

It is an object of one embodiment to provide an RPPG system suitable forestimating vital signs of a person driving a vehicle. Such an RPPGsystem is useful for detecting changes in driver alertness and can helpto prevent accidents. Unfortunately, the application of RPPG to drivermonitoring presents several unique challenges. Specifically, duringdriving, illumination on the driver's face can change dramatically. Forexample, during the day, the sunlight is filtered by trees, clouds, andbuildings before reaching the driver's face. As the vehicle moves, thisdirect illumination can change frequently and dramatically in bothmagnitude and spatial extent. At night, overhead streetlamps andheadlights of approaching cars cause large intensity, spatiallynon-uniform changes in illumination. These illumination changes can beso dramatic and omnipresent that a number of approaches to mitigatethese illumination variations are not practical.

To address these challenges, one embodiment uses active in-carillumination, in a narrow spectral band in which sunlight and streetlampspectral energy are both minimal. For example, due to the water in theatmosphere, the sunlight that reaches the earth's surface has much lessenergy around the frequency of 940 nm than it does at other frequencies.The light output by streetlamps and vehicle headlights is typically inthe visible spectrum, with very little power at infrared frequencies. Tothat end, one embodiment uses an active narrow-band illumination sourceat 940 nm and a camera filter at the same frequency, which ensures thatmuch of the illumination changes due to environmental ambientillumination are filtered away. Further, since this narrow frequencyband of 940 nm is beyond the visible range, humans do not perceive thislight source and thus are not distracted by its presence. Moreover, thenarrower the bandwidth of the light source used in the activeillumination, the narrower the bandpass filter on the camera can be,which further rejects changes due to ambient illumination. For example,some implementations use an LED source and camera bandpass filters with10 nm bandwidth.

Accordingly, one embodiment uses a narrow-bandwidth near-infrared (NIR)light source to illuminate the skin of the person at a narrow frequencyband including a near-infrared frequency of 940 nm and a NIR camera tomeasure the intensities of different regions of the skin in the narrowfrequency band. In such a manner, the measurements of the intensities ofeach of the different regions are single channel measurements.

Some embodiments are based on recognition that in the narrow frequencyband including the near-infrared frequency of 940 nm, the signalobserved by the NIR camera is significantly weaker than a signalobserved by a color intensity camera, such as an RGB camera. However,the experiments demonstrated effectiveness of the sparse reconstructionRPPG used by some embodiments in handling the week intensity signals.

In addition, some embodiments are based on recognition that because theintensity signal measured by the NIR camera is weak, additional methodscan be beneficial to increase the SNR of the measured intensities. Tothat end, one embodiment uses a filter to denoise the measurements ofthe intensities of each of the different regions using outlier-robustprincipal components analysis (RPCA). The RPCA used by this embodimentis computationally demanding and may not be necessary when the intensitysignal has a high SNR, as in the case of using an RGB camera. However,for the measurements at 940 nm frequency band, those additionalcomputations can be justified.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A shows a schematic illustrating some principles used by someembodiments to determine vital signs of the person using remotephotoplethysmography (RPPG).

FIG. 1B shows a schematic of some principles used by some embodiments toenforce joint sparsity in frequency domain on joint estimation ofphotoplethysmography waveforms for different regions of the skin of aperson.

FIG. 1C shows a block diagram of a remote photoplethysmography (RPPG)system 100 c in accordance with some embodiments.

FIG. 1D shows a block diagram of an RPPG method according to oneembodiment.

FIG. 2 shows a schematic of an exemplar RPPG method of some embodiments

FIG. 3 shows a schematic of a power spectrum curve used for determiningsignal-to-noise ratio (SNR) of the RPPG signal used by some embodimentsto evaluate usefulness of different regions.

FIG. 4A shows a schematic of RPPG system according to one example.

FIG. 4B shows a schematic of RPPG system according to another example.

FIG. 5 shows a plot of a spectrum of sunlight at the Earth's surfaceused by some embodiments.

FIG. 6 shows a plot for comparison of RPPG signal frequency spectrum inIR and RGB.

FIG. 7 shows a schematic of a vehicle including a processor for runningan RPPG method to produce vital signs of a person in the vehicleaccording to one embodiment.

DETAILED DESCRIPTION

FIG. 1A shows a schematic illustrating some principles used by someembodiments to determine vital signs of the person using remotephotoplethysmography (RPPG). Some embodiments are based on recognitionthat the sensitivity of the RPPG to noise in the measurements ofintensities (e.g., pixel intensities in camera images) of a skin of aperson 110 a is caused at least in part by independent derivation 140 aof photoplethysmographic waveforms from the intensities 120 a and 130 aof a skin of a person measured at different spatial positions. Someembodiments are based on recognition that at different positions, e.g.,at different regions of the skin of the person, the measurementintensities can be subjected to different and sometimes even unrelatednoise. To that end, photoplethysmographic waveforms that areindependently estimated 140 a of from intensities of different regionsof the skin of a person may fail to assist each other in identifyingsuch noise.

Some embodiments are based on recognition that measured intensities atdifferent regions of the skin of the person can be subjected todifferent and sometimes even unrelated noise. In contrast, the heartbeatis a common source of intensity variations present in different regionsof the skin. Thus, the effect of the noise on the quality of the RPPGestimation can be reduced when the independent estimation 140 a isreplaced 150 a with a joint estimation 160 a of differentphotoplethysmographic waveforms of intensity of a skin of a personmeasured at different regions of the skin. In this way, the embodimentscan extract the PPG waveform that is common to many regions (includingregions that may also contain considerable noise), while ignoring noisesignals that are not shared across many regions.

Some embodiments are based on recognition that it can be beneficial toestimate the PPG waveforms of different regions collectively, i.e.,using a common metric 180 a. Some embodiments are based on recognitionthat two types of noise are acting on the intensities of the skin, i.e.,external noise and internal noise. The external noise affects theintensity of the skin due to external factors such as lightingvariations, motion of the person, and resolution of the sensor measuringthe intensities. The internal noise affects the intensity of the skindue to internal factors such as different effects of cardiovascularblood flow on the appearance of different regions of the skin of aperson. For example, the heartbeat can affect the intensity of theforehead and cheeks of a person more than it affects the intensity ofthe nose.

Some embodiments are based on realization that both types of noise canbe addressed in the frequency domain of the intensity measurements.Specifically, the external noise is often non-periodic or has a periodicfrequency different than that of the signal of interest (e.g., thepulsatile signal), and thus can be detected in the frequency domain. Inaddition, the internal noise, while resulting in intensity magnitudevariations or time-shifts of the intensity variations in differentregions of the skin, preserves the periodicity of the common source ofintensity variations in the frequency domain.

To that end, some embodiments are based on realization that the commonmetric used to estimate the photoplethysmographic waveforms of differentregions should be enforced in the frequency domain 180 a of theintensity measurements, rather than in the domain of the intensitymeasurements themselves. In addition, joint sparsity of the frequencycoefficients forces different photoplethysmographic waveforms to besparse together in the same frequency bins and/or to have large energyonly in the same frequency bins. Hence, the joint sparsity adequatelyreflects the notion of the common source of intensity variations used bysome embodiments.

FIG. 1B shows a schematic of some principles used by some embodiments toenforce joint sparsity in the frequency domain for joint estimation ofphotoplethysmographic waveforms for different regions of the skin of aperson. Some embodiments are based on realization that since some vitalsigns, such as a heartbeat signal, are locally periodic and exist withinall regions, this common metric should be enforced in the frequencydomain. However, intensity measurements can be affected by noise that isalso periodic. Therefore, if the frequency coefficients ofphotoplethysmographic waveforms are directly derived from the intensitymeasurements, the enforcement of the common metric in frequency domainon such a direct estimation is problematic.

However, some embodiments are based on another realization that directestimation 110 b of photoplethysmographic waveforms, i.e., the waveformsare derived from measurements, can be replaced 120 b with anoptimization framework to reconstruct 130 b the frequency coefficientsof the photoplethysmographic waveforms to match the measuredintensities, rather than to directly compute the frequency coefficientsfrom the measured intensities. Such a reverse direction in theestimation of the frequency coefficients allows performing thereconstruction subject to constraints that can enforce the commonmetric, i.e., the joint sparsity, on the frequency coefficients ofdifferent photoplethysmographic waveforms of different regions.

FIG. 1C shows a block diagram of a remote photoplethysmography (RPPG)system 100 c in accordance with some embodiments. The system 100 cincludes a processor 120 c configured to execute stored instructions, aswell as a memory 140 c that stores instructions that are executable bythe processor. The processor 120 c can be a single core processor, amulti-core processor, a computing cluster, or any number of otherconfigurations. The memory 140 c can include random access memory (RAM),read only memory (ROM), flash memory, or any other suitable memorysystems. The processor 120 c is connected through a bus 106 c to one ormore input and output devices.

The instructions stored in the memory 140 c implement an RPPG method forestimating the vital signs of the person from measurements ofintensities of a skin of a person. To that end, the RPPG system 100 ccan also include a storage device 130 c adapted to store intensityvalues 134 c and various modules such as components 131 c, 132 c, and133 c executed by the processor 120 c to perform the vital signsestimations. The storage device 130 can be implemented using a harddrive, an optical drive, a thumb drive, an array of drives, or anycombinations thereof.

For example, the RPPG system 100 c includes a solver 131 c to solve anoptimization problem to determine frequency coefficients ofphotoplethysmographic waveforms corresponding to the measuredintensities at the different regions. According to some principlesemployed by different embodiments, the solver determines the frequencycoefficients to reduce a distance between intensities of the skinreconstructed from the frequency coefficients and the correspondingmeasured intensities of the skin while enforcing joint sparsity on thefrequency coefficients. Such a reverse reconstruction of the frequencycoefficients allows to enforce a common metric, i.e., the jointsparsity, in the frequency domain.

The RPPG system 100 c includes an estimator 132 c to estimate the vitalsigns of the person from the determined frequency coefficients ofphotoplethysmographic waveforms. In some implementations, the RPPGsystem 100 c includes a filter 133 c to denoise the measurements of theintensities of each of the different regions using robust principalcomponents analysis (RPCA).

The system 100 c includes an input interface 150 c to receive a sequenceof measurements of intensities of different regions of a skin of aperson indicative of vital signs of the person. For example, the inputinterface can be a network interface controller adapted to connect theRPPG system 100 c through the bus 106 c to a network 190 c. Through thenetwork 190 c, the values of intensity measurements 195 c can bedownloaded and stored as intensity values 134 c within the computer'sstorage system 130 c for storage and/or further processing.

Additionally, or alternatively, in some implementations, the RPPG system100 c is connected to a remote sensor 112 c, such as a camera, tocollect the intensity values 134 c. In some implementations, a humanmachine interface (HMI) 110 c within the system 100 c connects thesystem to input devices 111 c, such as a keyboard, a mouse, trackball,touchpad, joy stick, pointing stick, stylus, or touchscreen, amongothers.

The RPPG system 100 c can be linked through the bus 106 c to an outputinterface to render the vital signs of the person. For example, the RPPGsystem 100 c can include a display interface 160 c adapted to connectthe system 100 c to a display device 165 c, wherein the display device165 c can include a computer monitor, camera, television, projector, ormobile device, among others.

The RPPG system 100 c can also include and/or be connected to an imaginginterface 170 c adapted to connect the system to an imaging device 175c. The imaging device 175 c can include a video camera, computer, mobiledevice, webcam, or any combination thereof.

In some embodiments, the RPPG system 100 c is connected to anapplication interface 180 c through the bus 106 c adapted to connect theRPPG system 100 c to an application device 185 c that can operate basedon results of remote photoplethysmography. For example, in oneembodiment, the device 185 is a car navigation system that uses thevital signs of a person to decide how to control, e.g., steer, the car.In other embodiments, the device may be used to control components ofthe vehicle. For instance, in one embodiment, the device 185 is a drivermonitoring system, which uses the vital signs of the driver to determinewhen the driver is able to drive safely, e.g., whether the driver isdrowsy or not.

FIG. 1D shows a block diagram of RPPG method according to oneembodiment. A set of different skin regions 120 of a person are measuredusing an input interface 110, such as a video camera that measures theintensity of the light reflecting off the skin as it varies over aperiod of time, to produce a raw RPPG matrix, P 100. The diagram showsskin regions that are located on the face (facial regions), but it isunderstood that various embodiments are not limited to using the face;other embodiments use other regions of exposed skin, such as theperson's neck or wrists. The raw RPPG matrix 100, which includesmeasured intensities of the facial regions over time, is processed usinga solver 150, which is an implementation of the solver 131 c, thatdetermines 140 frequency coefficients 130 that correspond to theperson's vital signs through an iterative process.

In some implementations, the iterative process begins by settingestimated frequency coefficients 185 of all facial regions to 0 andcomputing an inverse Fourier transform 170 of the frequency coefficients185 to produce estimated region intensities 175. These estimated regionintensities 175, which represent the system's estimate of the RPPGsignal, are then subtracted 191 from the raw RPPG matrix 100. Thedifference 191 between the raw RPPG matrix 100 and the estimated regionintensities 175 is transformed using a Fourier transform 160 to producetemporary frequency coefficients 161. The temporary frequencycoefficients 161 are added 192 to the estimated frequency coefficients185 to produce updated frequency coefficients 162. The updated frequencycoefficients 162 are modified to enforce joint sparsity 180, and theresulting frequency coefficients are used as the new estimated frequencycoefficients 185. The new estimated frequency coefficients 185, whichreplace the previous iteration's estimated frequency coefficients 185,are used for a next iteration of the solver process 150.

In some embodiments, the solver enforces the joint sparsity as a softconstraint of the optimization problem, such that enforcing jointsparsity 180 forces the estimated frequency coefficients 185 to havenonzero values in only a small number of frequency bins, such that thenonzero bins are the same frequency bins across all facial regions. Theiterative solver process 150 is repeated until a convergence condition186 is met, for example, when the new estimated frequency coefficients185 are essentially unchanged from the previous iteration's estimatedfrequency coefficients 185. After convergence 186, the estimatedfrequency coefficients 185 are output by the solver and are used toestimate vital signs 140. For example, in one embodiment an estimatedvital sign is the frequency of the heartbeat 130 of the person over theperiod of time.

FIG. 2 shows a schematic of an exemplar RPPG method of some embodimentsadapted for RPPG signal tracking and denoising that can be applied tovideos recorded with a combination of NIR illumination and unknownambient lighting. In some implementations, the RPPG method extracts,tracks, and denoises the RPPG signal to obtain heart rate measurement.The method adaptively selects facial regions, i.e., a plurality offacial regions for which the desired vital sign contributessignificantly to the raw RPPG signal, and denoises their estimated RPPGsignals by relying on the fact that the pulsatile signal should besparse in the frequency domain and low-rank across facial regions.

The RPPG method obtains 210 the raw RPPG signals from a video of aperson by averaging the pixel intensity over all pixels in each of Nskin regions 120 at each time step (e.g., each video frame). In someembodiments these skin regions 120 are facial regions that are focusedaround the forehead, cheeks, and chin area. In some embodiments, theRPPG method excludes areas along the face boundary as well as the eyes,nose, and mouth, since these areas exhibit weak RPPG signals.

For every facial region j∈{1, . . . , N}, the measured intensityp_(j)(t) is a one-dimensional time series signal, where t∈{1, . . . , T}is the temporal video frame index within a temporal window of length T.In some embodiments, the measured intensity p_(j)(t) is the mean of theintensities of all pixels in region j at time t. In other embodiments,the measured intensity may be another measure of region intensity, suchas the median intensity across the pixels in region j at time t, or someother robust average measure of the region's intensity.

In the general formulation, the RPPG method models the RPPG measurementsfrom the N facial regions as a multichannel signal acquisition scenario,in which every facial region j provides a different channel measurementof the underlying heartbeat signal contaminated by noise. In particular,one embodiment models the measured signals p_(j)(t) as follows:p _(j)(t)=h _(j)(t)*y _(j)(t)+n _(j)(t),  (1)where * is the linear convolution operator, y_(j)(t) denotes theheartbeat signal observed at channel j, and h_(j)(t) and n_(j)(t) denotethe channel response function and channel noise, respectively. Since theheartbeat signal is known to be sparse in the frequency domain, werewrite (1) in vector form as shown belowp _(j) =h _(j) *F ⁻¹ x _(j) +n _(j),  (2)where F is the one-dimensional discrete Fourier transform of size T, andx_(j)∈C^(T) denotes the sparse frequency spectrum of the heartbeatsignal y_(j)∈R^(T).

The signal model in (2) is a blind multichannel estimation problem thatappears in fields such as wireless communications and sensorcalibration. In particular, if x_(j)=x is fixed across all regions j,the problem is cast as the self-calibration from multiple snapshotsmodel. The recoverability of these models relies on the ability to findlow-dimensional characterizations of the channel responses h_(j) and thesparse signal x. These embodiments consider the following signal model:p _(j) =F ⁻¹ x _(j) +n _(j),  (3)where the sparse spectrum signals x_(j) are not equal to each other, butthey share the same support, i.e., the frequencies that have nonzeroenergy are mostly the same across all facial regions.

In some embodiments, the RPPG method denoises the measurements of theintensities of each of the different regions using robust principalcomponents analysis (RPCA). For example, one embodiment processes theRPPG data by considering sliding time windows of length T. As shown inFIG. 2, for each time window, some embodiments stack the N RPPG signalsinto a T×N RPPG matrix P 100. In one embodiment, the columns of thematrix P are preprocessed by dividing the entries of each column by theaverage energy in the column. The process of dividing by the averageintensity normalizes the range of intensities in the matrix P, allowingthe processing to treat all regions equally.

The RPPG matrix 100 contains raw RPPG signals that can be contaminatedby large amounts of noise due to factors such as inaccurate motionalignment, abrupt illumination changes, and variations in the strengthof the RPPG signal across regions. However, all regions should expressthe same periodic physiological signal caused by the cardiac cycle(i.e., the pulsatile signal). Moreover, the periodicity of theunderlying heartbeat signal over the duration of the temporal windowinduces a low-rank matrix when the noise is removed. Therefore, someembodiments model the RPPG matrix P 100 as the superposition of alow-rank matrix Y containing the heartbeat signal and a noise matrixN=E+S, where E denotes inlier noise 232 and S denotes outlier noise 222,such thatP=Y+N=Y+E+S=Z+S.  (4)

For example, the outlier noise 222 arises from abrupt illuminationchanges and region tracking errors. These generally occur over a shorttime duration relative to the temporal processing window and affect asmall number of regions. The inlier noise 232 characterizes regions ofthe face where the heartbeat signal is not the dominant drivingharmonic. In this case, some embodiments suppress such regions from theheartbeat signal estimation. In order to extract an estimate of Y from Pand suppress outliers, the embodiments follow a robust principalcomponent analysis (RPCA) 240 approach and formulate the followingoptimization problem:

$\begin{matrix}{{{{\min\limits_{Z,S}{Z}_{*}} + {\gamma{S}_{1}\mspace{14mu}{subject}\mspace{14mu}{to}\mspace{14mu} P}} = {Z + S}},} & (5)\end{matrix}$where ∥Z∥_(*)=Σ_(k)σ_(k)(Y) denotes the nuclear norm of the matrix Z,which equals the sum of its singular values σ_(k). The l₁ norm of amatrix S is defined as

=Σ_(t,j)|S(t,j)|, which equals the sum of the absolute values of itsentries. The parameter γ controls the relative proportion of the signalenergy that will be absorbed into the noise component S (e.g., in oneembodiment we set γ=0.05). A smaller value of γ allows more of thesignal to be considered as noise.

Various embodiments use different methods for solving optimizationproblem (5). For example, one embodiment splits the low-rank matrix Zinto two factors Z=LR^(T), where L∈R^(T×r), R∈R^(N×r), and r<T,N is arank estimate parameter (e.g., in one embodiment we set rank r=12).Notice that the RPCA model is capable of eliminating sparse outliernoise 222 from the RPPG measurements making this approach fast andaccurate.

An illustration of denoising the RPPG signals is RPCA 240 shown insection 220 of FIG. 2. However, it may happen in some instances that thesignal from a facial region is noisy for the entire time window. Such anoise distribution could still be modeled as low-rank, and wouldtherefore not be removed by RPCA. Some embodiments address such noiseartifacts using sparse spectrum estimation 250.

For example, over a short time window, the heartbeat signal isapproximately periodic, composed of a dominant frequency along with itsharmonics. As a result, the frequency spectrum of a heartbeat signalshould be sparse. Moreover, the same heartbeat signal drives theperiodic behavior in the RPPG signals across all facial regions.Therefore, the noise-free frequency spectra x_(j) of the signals y_(j)from all regions j should have the same support.

Consider the signal model in (4), rewritten to model the denoised outputof RPCA as z_(j)=F⁻¹x_(j)+e_(j) and written in matrix form below:

$\begin{matrix}{{Z = {{{F^{- 1}X} + E} = {\begin{bmatrix}F^{- 1} & I\end{bmatrix}\begin{bmatrix}X \\E\end{bmatrix}}}},} & (6)\end{matrix}$where E 232 corresponds to the region level noise. Therefore, if aregion is noisy, some embodiments absorb the entire time window (allsamples) of that region into the matrix E. This can be achieved byforcing complete columns of E to be either zero or nonzero. On the otherhand, since the frequency components in X 130 should be sparse and havethe same support across all the regions, the columns of X are jointlysparse, i.e., the entire rows of X are either completely zero ornonzero.

Consequently, some embodiments define the following optimization problemto compute X 130 and E 232 from Z 221:

$\begin{matrix}{{{\min\limits_{X,E}{\frac{1}{2}{{Z - {A\begin{bmatrix}X \\E\end{bmatrix}}}}_{2}^{2}}} + {\lambda{X}_{2,1}} + {\mu{E^{\top}}_{2,1}}},} & (7)\end{matrix}$where we defined matrix A as the block matrix A=[F⁻¹ I], and the l_(2,1)norm of a matrix X is defined as∥X∥ _(2,1)=Σ_(t)√{square root over (Σ_(j) X(t,j)²)}.  (8)

In one embodiment, we set λ=0.2, μ=1. The solution to the above problemcan be obtained using iterative shrinkage/thresholding methods, such asFISTA, where the shrinkage function should be applied appropriately tothe row norms of X and column norms of E to produce the frequencyspectra 230 of recovered signal X 130 and noise E 232.

In such a manner, the optimization problem includes a two-one norm ofthe frequency coefficients and a difference between the measuredintensities and the intensities reconstructed from the frequencycoefficients, wherein the two-one norm of the frequency coefficients isweighted or non-weighted. Enforcing the joint sparsity of X using anl_(2,1) norm regularization forces facial regions to have nonzerofrequency coefficients in at most a small number of frequency bins thatare common to a plurality of facial regions, and sets all remainingfrequency coefficients to zero.

Fusion of Time Windows

Since heartbeat signals vary slowly over time, we may consider the RPPGobservations as multichannel measurements from a nearly stationaryprocess. Therefore, we process the RPPG signals using a sliding window

$\begin{matrix}{{P = \begin{bmatrix}P_{o} \\P_{n}\end{bmatrix}},} & (9)\end{matrix}$where P_(n) denotes the new RPPG data that did not exist in the previouswindow, and P_(o) is the portion of the previous (old) window's RPPGdata that is also in the current window. For better noise suppression,we construct a weighted-average time-fused window

$\begin{matrix}{{\overset{\_}{P} = {{\alpha\; P} + {\left( {1 - \alpha} \right)\begin{bmatrix}{\overset{\sim}{Y}}_{o} \\P_{n}\end{bmatrix}}}},} & (10)\end{matrix}$where {tilde over (Y)}=F⁻¹X is the filtered output of the previous timewindow, and {tilde over (Y)}_(o) is the portion of {tilde over (Y)} thatis also present in the current window. The time-fused window P is thenfurther denoised using the RPCA procedure, and the new sparse spectrumis estimated as described above.

In such a manner, the vital signs are determined iteratively for timewindows defining different segments of the sequence of measurements. Asegment of measurements for a current iteration includes a first portionand a second portion, the first portion is formed by intensitiesreconstructed from the frequency coefficients determined during theprevious iteration for a time period corresponding to the first portionof the segment of the current iteration, and the second portion isformed by intensities measured for a time period of the second portionof the segment. In such a manner, discontinuity of vital signsestimation can be reduced.

For example, one embodiment uses a time window of duration 10 secondsand an overlap between time windows, where only 10 frames (0.33 secondsfor videos recorded at 30 frames per second (fps)) come from the newtime window, and we set the weight for averaging of time-fused windowsto α=0.03.

Preprocessing to Reject Facial Regions

Some facial regions are physiologically known to contain better RPPGsignals. However, the “goodness” of these facial regions also depends onthe particular video conditions, facial hair, or facial occlusions.Therefore, it is beneficial to identify which regions are likely tocontain the most noise and remove them before any processing, so thatthey don't affect the signal estimates.

FIG. 3 shows a schematic of a power spectrum curve used for determiningsignal-to-noise ratio (SNR) of the RPPG signal used by some embodimentsto evaluate usefulness of different regions. For example, someembodiments do so by throwing away a region if its SNR is below athreshold θ_(SNR) (e.g., θ_(SNR)=0.2) or if its maximum amplitude isabove a threshold θ_(amp). For example, one embodiment sets θ_(amp) tobe four times the average RPPG signal amplitude and/or the SNR isdetermined 300 as the ratio of the area under the power spectrum curvein a region 310 surrounding the maximum peak in the frequency spectrum,divided by the area under the curve in the rest of the frequencyspectrum in a frequency range that contains the physiological range 320of heartbeat signals (e.g., from 30 to 300 beats per minute (bpm)).

Some embodiments, within each time window, can reject different facialregions. To perform fusion of time windows, the embodiments firstrecompose the signal X in the missing regions by interpolating fromneighboring regions.

EXEMPLAR IMPLEMENTATION(S)

To compute the raw RPPG signals P 100, some embodiments first use a facealignment (i.e., facial landmark detection) method to detect a number(e.g., 68) of facial landmarks 211, then interpolate and extrapolate thedetected landmarks to a larger number (e.g., 145) of interpolatedlandmarks 212 to include the forehead region and subdivide the face intomore regions. We use the facial landmarks to divide the face into anumber (e.g., 48) of facial regions 120.

Since pixel intensity changes due to the heartbeat are a small fractionof each pixel's intensity, it is necessary to average pixels'intensities over a region of the face in order to get a consistentsignal. However, this signal is still sensitive to which pixels areincluded in the region. In some embodiments, we re-detect the facialregions in each video frame. However, applying a face alignmentalgorithm independently to each frame can cause regions to move a smallamount from one frame to the next, and this change in which pixels areincluded in a region can add extra noise that makes it difficult toestimate the vital sign of interest. To minimize such noise, someembodiments track the facial landmarks or the facial regions ofinterest. For example, one embodiment tracks the facial regions using aKanade-Lucas-Tomasi (KLT) tracker and the RANSAC method. In each frame,the embodiment spatially averages the pixel intensities in each facialregion to obtain a raw RPPG signal. The embodiment subtracts the meanintensity over time of each region's signals and use a bandpass filterto restrict the signals to the frequency range [30 bpm, 300 bpm], whichincludes the physiological range of the cardiac signals of interest.

To combine the denoised signals from each facial region, we take amedian in each frequency bin across the regions of X. We use the medianbecause it obtains an average that is robust to outliers, but otherembodiments can use other averaging methods, such as a mean, in place ofthe median. The estimate of the heart rate in the time window is thefrequency component for which the power of the frequency spectrum ismaximum.

FIG. 4A shows a schematic of RPPG system according to one example. Inthis example, a patient 10 being hospitalized in a hospital bed. In sucha hospitalization scenario the vital signs of the patient 10 need to bemonitored. Conventional monitoring systems thereby usually rely onattachable sensors, i.e. body mounted sensors. In order to increasepatient comfort, remote monitoring systems can be used, which can reducethe required cabling. In FIG. 4A there is illustrated a monitoringsystem 12 for remotely monitoring a vital sign of a patient 10 accordingto an aspect of the present invention. The illustrated system 12 therebymakes use of the remote photoplethysmographic measurement principle.Thereby, a camera 14 is used to capture an image, i.e., a video sequenceof the patient 10.

This camera can include a CCD or CMOS sensor for converting incidentlight and the intensity variations thereof into an electronic signal.The camera 14 particularly non-invasively captures light reflected froma skin portion of the patient 10. A skin portion may therebyparticularly refer to the forehead or the chest of the patient. A lightsource, e.g. an infrared or visible light source, may be used toilluminate the patient or a region of interest including a skin portionof the patient. It may also be possible that the patient 10 isilluminated with light of a certain limited spectrum or that twospecific spectra (i.e. colors) are captured separately in order toanalyze differences resulting therefrom. Based on the captured images,information on a vital sign of the patient 10 can be determined. Inparticular, vital signs such as the heart rate, the breathing rate orthe blood oxygenation of the patient 10 can be determined. Thedetermined information is usually displayed on an operator interface 16for presenting the determined vital sign. Such an operator interface 16may be a patient bedside monitor or may also be a remote monitoringstation in a dedicated room in a hospital or even in a remote locationin telemedicine applications. Prior to being able to display vital signinformation, the detected images need to be processed. The detectedimages may, however, comprise noise components. The main sources ofnoise are motion of the patient 10 and (ambient) light fluctuations.Hence, an appropriate signal processing is required. Usually, aplurality of time signals being more or less representative of vitalsigns (heart rate, breathing rate, blood oxygen saturation) is acquired.The acquisition may thereby be operated on a specific spectral range(visible, infrared, combination of selected spectral bands), maybeoperated at global or local level (one time signal per skin measurementarea versus several signals originating from the skin measurement area)and may involve techniques like principal component analysis,independent component analysis, local density approximation, linearprojection into color subspaces, or signal decomposition techniques likewavelets, sinusoidal modeling, and Empirical Mode Decomposition (EMD).

FIG. 4B shows a schematic of RPPG system according to another example.In this example, the system 12 is adapted from a control environment ofa hospital room to a volatile environment of the Driver MonitoringSystem (DMS) 40 to provide accurate measurements under motion/lightdistortions.

To that end, some embodiments provide an RPPG system suitable forestimating vital signs of a person driving a vehicle. Such an RPPGsystem is useful for detecting changes in driver alertness and can helpto prevent accidents. Unfortunately, the application of RPPG to drivermonitoring presents several unique challenges. Specifically, duringdriving, illumination on the driver's face can change dramatically. Forexample, during the day, the sunlight is filtered by trees, clouds, andbuildings before reaching the driver's face. As the vehicle moves, thisdirect illumination can change frequently and dramatically in bothmagnitude and spatial extent. At night, overhead streetlamps andheadlights of approaching cars cause large intensity, spatiallynon-uniform changes in illumination. These illumination changes can beso dramatic and omnipresent that a number of approaches to mitigatethese illumination variations are not practical.

To address these challenges, additionally or alternatively to sparsereconstruction with joint sparsity disclosed above, one embodiment usesactive in-car illumination, in a narrow spectral band in which sunlightand streetlamp spectral energy are both minimal.

FIG. 5 shows a plot of a spectrum of sunlight at the Earth's surfaceused by some embodiments. For example, due to the water in theatmosphere, the sunlight that reaches the earth's surface has much lessenergy around the frequency of 940 nm than it does at other frequencies.The light output by streetlamps and vehicle headlights is typically inthe visible spectrum, with very little power at infrared frequencies. Tothat end, one embodiment uses an active narrow-band illumination sourceat 940 nm and a camera filter at the same frequency, which ensures thatmuch of the illumination changes due to environmental ambientillumination are filtered away. Further, since this narrow frequencyband of 940 nm is beyond the visible range, humans do not perceive thislight source and thus are not distracted by its presence. Moreover, thenarrower the bandwidth of the light source used in the activeillumination, the narrower the bandpass filter on the camera can be,which further rejects changes due to ambient illumination. For example,some implementations use an LED source and camera bandpass filters with10 nm bandwidth.

Accordingly, one embodiment uses a narrow-bandwidth near-infrared (NIR)light source to illuminate the skin of the person at a narrow frequencyband including a near-infrared frequency of 940 nm and a NIR camera tomeasure the intensities of different regions of the skin in the narrowfrequency band. In such a manner, the measurements of the intensities ofeach of the different regions are single channel measurements.

Some embodiments are based on recognition that in the narrow frequencyband including the near-infrared frequency of 940 nm, the signalobserved by the NIR camera is significantly weaker than a signalobserved by a color intensity camera, such as an RGB camera. However,the experiments demonstrated effectiveness of the sparse reconstructionRPPG used by some embodiments in handling the week intensity signals.

FIG. 6 shows a plot for comparison of RPPG signal frequency spectrum inIR and RGB. The RPPG signal in IR 610 is about 10 times weaker than inRGB 620. Therefore, the RPPG system of one embodiment includes anear-infrared (NIR) light source to illuminate the skin of the person,wherein the NIR light source provides illumination in a first frequencyband, and a camera to measure the intensities of each of the differentregions in a second frequency band overlapping the first frequency band,such that the measured intensities of a region of the skin are computedfrom intensities of pixels of an image of the region of the skin.

The first frequency band and the second frequency band include anear-infrared frequency of 940 nm. The system includes a filter todenoise the measurements of the intensities of each of the differentregions using robust principal components analysis (RPCA). The secondfrequency band, which in one embodiment is determined by a bandpassfilter on the camera, has a passband of width less than 20 nm, e.g., thebandpass filter has a narrow passband whose full width at half maximum(FWHM) is less than 20 nm. In other words, the overlap between the firstfrequency band and the second frequency band is less than 20 nm wide.Such a system in combination with sparse reconstruction is able toperform RPPG for DMS environment.

Some embodiments incorporate the realization that optical filters suchas bandpass filters and long-pass filters (i.e., filters that blocktransmission of light whose wavelength is less than a cutoff frequencybut allow transmission of light whose wavelength is greater than asecond, often equal, cutoff frequency) may be highly sensitive to theangle of incidence of the light passing through the filter. For example,an optical filter may be designed to transmit and block specifiedfrequency ranges when the light enters the optical filter parallel tothe axis of symmetry of the optical filter (roughly perpendicular to theoptical filter's surface), which we will call an angle of incidence of0°. When an angle of incidence varies from 0°, many optical filtersexhibit “blue shift,” in which the passband and/or cutoff frequencies ofthe filter effectively shift to shorter wavelengths. To account for thisblue shift phenomenon, some embodiments use a center frequency of theoverlap between the first and second frequency bands to have awavelength greater than 940 nm (e.g., they shift the center frequency ofa bandpass optical filter or the cutoff frequencies of a long-passoptical filter to have a longer wavelength than 940 nm).

Furthermore, because light from different parts of the skin will beincident upon the optical filter at different angles of incidence, theoptical filter allows different transmission of light from differentparts of the skin. To compensate for this, some embodiments use abandpass filter with a wider passband, e.g., the bandpass optical filterhas a passband that is wider than 20 nm, and hence the overlap betweenthe first and second frequency bands is greater than 20 nm wide.

FIG. 7 shows a schematic of a vehicle 701 including a processor 702 forrunning a RPPG 705 to produce vital signs 726 of a person in the vehicleaccording to one embodiment. In this embodiment, the NIR light sourceand/or the NIR camera 720 are arranged in the vehicle 701. For example,the NIR light source is arranged in the vehicle to illuminate the skinof the person driving the vehicle, and the NIR camera 720 is arranged inthe vehicle to measure the intensities of different regions of the skinof the person driving the vehicle. This embodiment also includes acontroller 750 to execute a control action based on the estimated vitalsigns of the person driving the vehicle. For example, the controller canreduce a speed of the vehicle and/or change a steering of the vehicle.

The above-described embodiments of the present invention can beimplemented in any of numerous ways. For example, the embodiments may beimplemented using hardware, software or a combination thereof. Whenimplemented in software, the software code can be executed on anysuitable processor or collection of processors, whether provided in asingle computer or distributed among multiple computers. Such processorsmay be implemented as integrated circuits, with one or more processorsin an integrated circuit component. A processor may be implemented usingcircuitry in any suitable format.

Also, the embodiments of the invention may be embodied as a method, ofwhich an example has been provided. The acts performed as part of themethod may be ordered in any suitable way. Accordingly, embodiments maybe constructed in which acts are performed in an order different thanillustrated, which may include performing some acts simultaneously, eventhough shown as sequential acts in illustrative embodiments.

Use of ordinal terms such as “first,” “second,” in the claims to modifya claim element does not by itself connote any priority, precedence, ororder of one claim element over another or the temporal order in whichacts of a method are performed, but are used merely as labels todistinguish one claim element having a certain name from another elementhaving a same name (but for use of the ordinal term) to distinguish theclaim elements.

Although the invention has been described by way of examples ofpreferred embodiments, it is to be understood that various otheradaptations and modifications can be made within the spirit and scope ofthe invention.

Therefore, it is the object of the appended claims to cover all suchvariations and modifications as come within the true spirit and scope ofthe invention.

The invention claimed is:
 1. A remote photoplethysmography (RPPG)system, comprising: a processor; and memory having instructions storedthereon that, when executed by the processor, cause the RPPG system to:transform intensities of pixels of one or multiple images of a skin of aperson into a sequence of measurements of intensities of differentregions of the skin of the person indicative of vital signs of theperson; solve an optimization problem to determine frequencycoefficients of photoplethysmographic waveforms corresponding to themeasured intensities at each of the different regions to reduce adifference between intensities of the skin of different regionsreconstructed from the frequency coefficients of corresponding regionsand the corresponding measured intensities of the skin while enforcingjoint sparsity on the frequency coefficients of different regions,wherein the joint sparsity is enforced on the optimization problem as aconstraint specifying that the frequency coefficients of each regionhave more zero elements than non-zero elements and the non-zero elementsof the frequency coefficients of different regions belong to the samefrequency bins; estimate the vital signs of the person from thefrequency coefficients of photoplethysmographic waveforms determined fordifferent regions; output the estimated vital signs of the person; andsubmit the estimated vital signs to a controller of a vehicle configuredto execute a control action based on the estimated vital signs of theperson driving the vehicle.
 2. The RPPG system of claim 1, wherein theconstraint enforces the joint sparsity as a soft constraint of theoptimization problem.
 3. The RPPG system of claim 1, wherein theoptimization problem includes a two-one norm of the frequencycoefficients and the difference between the measured intensities and theintensities reconstructed from the frequency coefficients, wherein thetwo-one norm of the frequency coefficients is weighted or non-weighted.4. The RPPG system of claim 1, wherein the vital signs are determinediteratively for different segments of the sequence of measurements,wherein a segment of measurements for a current iteration includes afirst portion and a second portion, the first portion is formed byintensities reconstructed from the frequency coefficients determinedduring the previous iteration for a time period corresponding to thefirst portion of the segment of the current iteration, and the secondportion is formed by intensities measured for a time period of thesecond portion of the segment.
 5. A device communicatively connected tothe RPPG system of claim 1, comprising: a camera configured to measurethe intensities of pixels of the images of the skin of the person; and adisplay device configured to render the vital signs of the person. 6.The device of claim 5, wherein the camera includes an optical filterconfigured to produce the intensities of the pixels of the image ofdifferent regions of the skin of the person in a non-visible frequencyband including a near-infrared frequency corresponding to a wavelengthof 940 nm.
 7. The RPPG system of claim 1, further comprising: anear-infrared (NIR) light source to illuminate the skin of the person,wherein the NIR light source provides illumination in a first frequencyband, and a camera to measure the intensities of each of the differentregions in a second frequency band overlapping the first frequency band,such that the measured intensities of a region of the skin are computedfrom intensities of pixels of an image of the region of the skin.
 8. TheRPPG system of claim 7, wherein the first frequency band and the secondfrequency band include a near-infrared wavelength of 940 nm, wherein theoverlap between the first frequency band and the second frequency bandis less than or equal to 20 nm wide, further comprising: a filter todenoise the measurements of the intensities of each of the differentregions using robust principal components analysis (RPCA).
 9. The RPPGsystem of claim 7, wherein the overlap between the first frequency bandand the second frequency band is centered at a wavelength greater than940 nm, wherein the overlap between the first frequency band and thesecond frequency band is greater than 20 nm wide, further comprising: afilter to denoise the measurements of the intensities of each of thedifferent regions using robust principal components analysis (RPCA). 10.A control system for controlling at least one component of the vehicle,comprising: the RPPG system of claim 7, wherein the NIR light source isarranged in the vehicle to illuminate the skin of the person driving thevehicle, and wherein the camera is arranged in the vehicle to measurethe intensities of different regions of the skin of the person drivingthe vehicle; and the controller to execute a control action based on theestimated vital signs of the person driving the vehicle.
 11. The RPPGsystem of claim 1, wherein the sequence of measurements of intensitiesof a region of the skin is a mean or a median of intensities of pixelsof an image of the region of the skin acquired by a camera.
 12. The RPPGsystem of claim 11, wherein the intensities of pixels belong to a singlechannel of the image.
 13. The RPPG system of claim 11, wherein theregion of the skin is a region on a face of the person identified by thecamera using automatic facial landmark localization.
 14. The RPPG systemof claim 1, wherein to estimate the vital signs at each instant of time,the processor is configured to combine corresponding elements of thefrequency coefficients of different regions that belong to the samefrequencies to produce median frequency coefficients and reconstruct thevital signs from the median frequency coefficients.
 15. The RPPG systemof claim 1, wherein the vital signs include a pulse rate of the person.16. A remote photoplethysmography (RPPG) method, wherein the method usesa processor coupled with stored instructions implementing the method,wherein the instructions, when executed by the processor carry out stepsof the method, comprising: transforming intensities of pixels of one ormultiple images of a skin of a person into a sequence of measurements ofintensities of different regions of the skin of the person indicative ofvital signs of the person; solving an optimization problem to determinefrequency coefficients of photoplethysmographic waveforms correspondingto the measured intensities at each of the different regions to reduce adifference between intensities of the skin of different regionsreconstructed from the frequency coefficients of corresponding regionsand the corresponding measured intensities of the skin while enforcingjoint sparsity on the frequency coefficients of different regions,wherein the joint sparsity is enforced on the optimization problem as aconstraint specifying that the frequency coefficients of each regionhave more zero elements than non-zero elements and the non-zero elementsof the frequency coefficients of different regions belong to the samefrequency bins; estimating the vital signs of the person from thefrequency coefficients of photoplethysmographic waveforms determined fordifferent regions; and submitting the estimated vital signs to acontroller of a vehicle configured to execute a control action based onthe estimated vital signs of the person driving the vehicle.
 17. TheRPPG method of claim 16, wherein the optimization problem includes atwo-one norm of the frequency coefficients and a difference between themeasured intensities and the intensities reconstructed from thefrequency coefficients, wherein the two-one norm of the frequencycoefficients is weighted or non-weighted, wherein the vital signs aredetermined iteratively for different segments of the sequence ofmeasurements, wherein a segment of measurements for a current iterationincludes a first portion and a second portion, the first portion isformed by intensities reconstructed from the frequency coefficientsdetermined during the previous iteration for a period of the firstportion of the segment and the second portion is formed by intensitiesmeasured for a period of the second portion of the segment.
 18. The RPPGmethod of claim 16, wherein the intensities of different regions of theskin of the person are measured in a frequency band including anear-infrared frequency of 940 nm.
 19. A non-transitorycomputer-readable storage medium embodied thereon a program executableby a processor for performing a method, the method comprising:transforming intensities of pixels of one or multiple images of a skinof a person into a sequence of measurements of intensities of differentregions of the skin of the person indicative of vital signs of theperson; solving an optimization problem to determine frequencycoefficients of photoplethysmographic waveforms corresponding to themeasured intensities at each of the different regions to reduce adifference between intensities of the skin of different regionsreconstructed from the frequency coefficients of corresponding regionsand the corresponding measured intensities of the skin while enforcingjoint sparsity on the frequency coefficients of different regions,wherein the joint sparsity is enforced on the optimization problem as aconstraint specifying that the frequency coefficients of each regionhave more zero elements than non-zero elements and the non-zero elementsof the frequency coefficients of different regions belong to the samefrequency bins; estimating the vital signs of the person from thefrequency coefficients of photoplethysmographic waveforms determined fordifferent regions; and rendering the vital signs of the person to acontroller of a vehicle configured to execute a control action based onthe estimated vital signs of the person driving the vehicle.
 20. Aremote photoplethysmography (RPPG) system, comprising: a processor; andmemory having instructions stored thereon that, when executed by theprocessor, cause the RPPG system to: receive a sequence of measurementsof intensities of different regions of a skin of a person indicative ofvital signs of the person; solve an optimization problem to determinefrequency coefficients of photoplethysmographic waveforms correspondingto the measured intensities at each of the different regions to reduce adifference between intensities of the skin of different regionsreconstructed from the frequency coefficients of corresponding regionsand the corresponding measured intensities of the skin while enforcingjoint sparsity on the frequency coefficients of different regions,wherein the joint sparsity is enforced on the optimization problem as aconstraint specifying that the frequency coefficients of each regionhave more zero elements than non-zero elements and the non-zero elementsof frequency coefficients of different regions belong to the samefrequency bins; estimate the vital signs of the person from thefrequency coefficients of photoplethysmographic waveforms determined fordifferent regions; and submit the estimated vital signs to a controllerof a vehicle configured to execute a control action based on theestimated vital signs of the person driving the vehicle.